1 


Recursive  £i)00  Group  lasso 

Yilun  Chen,  Student  Member,  IEEE,  and  Alfred  O.  Hero,  III,  Fellow,  IEEE 


Abstract — We  introduce  a  recursive  adaptive  group  lasso  al¬ 
gorithm  for  real-time  penalized  least  squares  prediction  that 
produces  a  time  sequence  of  optimal  sparse  predictor  coefficient 
vectors.  At  each  time  index  the  proposed  algorithm  computes  an 
exact  update  of  the  optimal  U,oo -penalized  recursive  least  squares 
(RLS)  predictor.  Each  update  minimizes  a  convex  but  non- 
differentiable  function  optimization  problem.  We  develop  an  on¬ 
line  homotopy  method  to  reduce  the  computational  complexity. 
Numerical  simulations  demonstrate  that  the  proposed  algorithm 
outperforms  the  1 1  regularized  RLS  algorithm  for  a  group  sparse 
system  identification  problem  and  has  lower  implementation 
complexity  than  direct  group  lasso  solvers. 

Index  Terms — RLS,  group  sparsity,  mixed  norm,  homotopy, 
group  lasso,  system  identification 

I.  Introduction 

Recursive  Least  Squares  (RLS)  is  a  widely  used  method  for 
adaptive  filtering  and  prediction  in  signal  processing  and  re¬ 
lated  fields.  Its  applications  include:  acoustic  echo  cancelation; 
wireless  channel  equalization;  interference  cancelation  and 
data  streaming  predictors.  In  these  applications  a  measurement 
stream  is  recursively  fitted  to  a  linear  model,  described  by 
the  coefficients  of  an  FIR  prediction  filter,  in  such  a  way  to 
minimize  a  weighted  average  of  squared  residual  prediction 
errors.  Compared  to  other  adaptive  filtering  algorithms  such 
as  Least  Mean  Square  (LMS)  filters,  RLS  is  popular  because 
of  its  fast  convergence  and  low  steady-state  error. 

In  many  applications  it  is  natural  to  constrain  the  predictor 
coefficients  to  be  sparse.  In  such  cases  the  adaptive  FIR 
prediction  filter  is  a  sparse  system:  only  a  few  of  the  impulse 
response  coefficients  are  non-zero.  Sparse  systems  can  be 
divided  into  general  sparse  systems  and  group  sparse  systems 

[1] ,  [2],  Unlike  a  general  sparse  system,  whose  impulse  re¬ 
sponse  can  have  arbitrary  sparse  structure,  a  group  sparse  sys¬ 
tem  has  impulse  response  composed  of  a  few  distinct  clusters 
of  non-zero  coefficients.  Examples  of  group  sparse  systems 
include  specular  multipath  acoustic  and  wireless  channels  [3], 

[4]  and  compressive  spectrum  sensing  of  narrowband  sources 

[5] . 

The  exploitation  of  sparsity  to  improve  prediction  perfor¬ 
mance  has  attracted  considerable  interest.  For  general  sparse 
systems,  the  I\  norm  has  been  recognized  as  an  effective 
promotor  of  sparsity  [6],  [7].  In  particular,  l\  regularized  LMS 

[2] ,  [8]  and  RLS  [9],  [10]  algorithms  have  been  proposed 
for  for  sparsification  of  adaptive  filters.  For  group  sparse 
systems,  mixed  norms  such  as  the  norm  and  the  £ii00 
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norm  have  been  applied  to  promote  group-level  sparsity  in 
statistical  regression  [  1 1]— [13],  commonly  referred  to  as  the 
group  lasso,  and  sparse  signal  recovery  in  signal  processing 
and  communications  [1],  [14].  In  [2],  the  authors  proposed  a 
family  of  LMS  filters  with  convex  regularizers.  As  a  specific 
scenario,  they  considered  group  sparse  LMS  filters  with  2- 
type  regularizers  and  empirically  demonstrated  the  superior 
performance  of  t-\  ,2 -regularized  LMS  over  the  (\ -regularized 
LMS  for  identifying  unkonwn  group-sparse  channels. 

In  this  paper,  we  develop  group  sparse  RLS  algorithms 
for  real-time  system  identification.  Specifically,  we  consider 
RLS  penalized  by  the  £ii00  norm  to  promote  group  sparsity, 
which  we  call  the  recursive  (\  jOC  group  lasso.  The  algorithm  is 
based  on  the  homotopy  approach  to  solving  the  lasso  problem 
and  is  a  nontrivial  extension  of  [15]— [17]  to  group  sparse 
structure.  Both  the  and  l ijOC  regularizer  have  been  widely 
adopted  to  enforce  group  sparse  structure  in  regression  and 
other  problems.  While,  as  shown  in  [18],  in  some  cases  the 
1 12  is  more  effective  in  enforcing  group  sparsity,  we  adopt  the 
1 1 ,oo  norm  due  to  its  more  favorable  computational  properties. 
As  we  show  in  this  paper,  the  (ii00  regularizer  can  be  very 
efficiently  implemented  in  the  recursive  setting  of  RLS.  Our 
implementation  exploits  the  piecewise  linearity  of  the  (ii00 
norm  to  obtain  an  exact  solution  to  each  iteration  of  the  group 
sparsity-penalized  RLS  algorithm  using  the  homotopy  ap¬ 
proach.  Compared  to  other  contemporary  (\ jOC  and  A, 2  group 
lasso  solvers  [12],  [19],  our  simulation  results  demonstrate  that 
our  proposed  methods  attain  equivalent  or  better  performance 
but  at  lower  computational  cost  for  online  group  sparse  RLS 
problems. 

The  paper  is  organized  as  follows.  Section  II  formulates 
the  problem.  In  Section  III  we  develop  the  homotopy  based 
algorithm  to  solve  the  recursive  £i)00  group  lasso  in  an  online 
recursive  manner.  Section  IV  provides  numerical  simulation 
results  and  Section  V  summarizes  our  principal  conclusions. 
The  proofs  of  theorems  and  some  details  of  the  proposed 
algorithm  are  provided  in  Appendix. 

Notations :  In  the  following,  matrices  and  vectors  are  de¬ 
noted  by  boldface  upper  case  letters  and  boldface  lower  case 
letters,  respectively;  (-)T  denotes  the  transpose  operator,  and 
||  •  ||i  and  ||  •  | loo  denote  the  I\  and  norm  of  a  vector, 
respectively;  for  a  set  A,  A  denotes  its  cardinality  and  <f> 
denotes  the  empty  set;  x_4  denotes  the  sub-vector  of  x  from 
the  index  set  A  and  R_4g  denotes  the  sub-matrix  of  R  formed 
from  the  row  index  set  A  and  column  index  set  B. 

II.  Problem  formulation 
A.  Recursive  Least  Squares 

Let  w  be  a  p-dimensional  coefficient  vector.  Let  y  be  an 
n-dimensional  vector  comprised  of  observations  {yj}j=1.  Let 
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{xj}"=1  be  a  sequence  of  p- dimensional  predictor  variables. 
In  standard  adaptive  filtering  terminology,  r/j,  Xj  and  w  are 
the  primary  signal,  the  reference  signal,  and  the  adaptive  filter 
weights.  The  RLS  algorithm  solves  the  following  quadratic 
minimization  problem  recursively  over  time  n  =  p,p  +  l, . . 


w, 


(7  Jx,x? 


3=1 


and 


l'n  =  J2^n  ‘X'!lr 

3=1 
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=  argminy'7”  J(t/j  -  WTXj)2,  (1) 

W  Z - ' 

3= 1 


where  7  £  (0, 1]  is  the  forgetting  factor  controlling  the  trade¬ 
off  between  transient  and  steady-state  behaviors. 

To  serve  as  a  template  for  the  sparse  RLS  extensions 
described  below  we  briefly  review  the  RLS  update  algorithm. 
Define  R„  and  r„  as 
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(3) 


The  solution  w„  to  (1)  can  be  then  expressed  as 

w„  =  R"1^.  (4) 

The  matrix  Rn  and  the  vector  r„  are  updated  as 
R.':  =  7R/t  —  1  T  X„x„ , 

and 

rn  =  7rn-l  +  XnVn- 

Applying  the  Sherman-Morrison- Woodbury  formula  [20], 


2 
0 

-2 

1 

0.2 

0 

_0'20  1  00  200  300  400  500 

(b) 

Fig.  1.  Examples  of  (a)  a  general  sparse  system  and  (b)  a  group-sparse 
system. 


where  {Gm}m=i  *s  a  grouP  partition  of  the  index  set  Q  = 

M 

|^_J  Gm  =  G 1  Gm  n  Gm'  =  (f>  if  TYl  ^  777-  , 

m=  1 

and  wgm  is  a  sub-vector  of  w  indexed  by  Qrn.  The  l\  x  norm 
is  a  mixed  norm:  it  encourages  correlation  among  coefficients 
inside  each  group  via  the  norm  within  each  group  and 
promotes  sparsity  across  each  group  using  the  l\  norm.  The 
mixed  norm  ||w||ij00  is  convex  in  w  and  reduces  to  ||w||i 
when  each  group  contains  only  one  coefficient,  i.e.,  \Gi\  = 
1^2 1  =  •  •  •  =  \Gm\  =  1- 

The  £i>00  group  lasso  solves  the  following  penalized  least 
squares  problem: 


Rn1=7  ^n- 1-7  10‘nSnSl, 

(5) 

1  n 

wn  =  arg  min  -  T'  7n_J  (jjj  -  wTxjf  +  A||w||1|tx),  (10) 
w  Z  z — 

3= 1 

where 

Sn  =  Rra_ixn 

(6) 

and 

1 

|  T  ’ 

7  +  xn  gn 

(7) 

where  A  is  a  regularization  parameter  and  as  in  standard  RLS 
7  controls  the  trade-off  between  the  convergence  rate  and 
steady-state  performance.  Eq.  (10)  is  a  convex  problem  and 

Substituting  (5)  into  (4),  we  obtain  the  weight  update  [21] 

wn_\  -f-  crngnen,  (8) 

where 


can  be  solved  by  standard  convex  optimizers  or  path  tracing 
algorithms  [12].  Direct  solution  of  (10)  has  computational 
complexity  of  0(p3). 


en  =  Un  -  Wn_!Xn. 


(9)  C.  Recursive  £it00  group  lasso 


Equations  (5)-(9)  define  the  RLS  algorithm  which  has  com¬ 
putational  complexity  of  order  0{p2). 

B.  Non-recursive  i\t0o  group  lasso 

The  £ij00  group  lasso  is  a  regularized  least  squares  approach 
which  uses  the  £ijOC  mixed  norm  to  promote  group-wise  sparse 
pattern  on  the  predictor  coefficient  vector.  The  £lj00  norm  of 
a  vector  w  is  defined  as 

M 

IMkoo  =  HwsJI°o, 

m=  1 


In  this  subsection  we  obtain  a  recursive  solution  for  (10) 
that  gives  an  update  w„  from  w„_i.  The  approach  taken  is 
a  group-wise  generalization  of  recent  works  [15],  [16]  that 
uses  the  homotopy  approach  to  sequentially  solve  the  lasso 
problem.  Using  the  definitions  (2)  and  (3),  the  problem  (10) 
is  equivalent  to 

wn  =  argmin  TTRnw  -  wTrn  +  A||w||i 

W  Z 

=  argmin \wT  kRn-i  +  x^x„)  w 

W  Z 

-  wT(7rn_i  +  xnyn)  +  A|| w|| ij0O- 
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Let  /(/3 ,  A)  be  the  solution  to  the  following  parameterized 
problem 

f(P,  A)  =  argmin  ^wT  (7^,-1  +  /3 x„x^)  w 

w  2  (12) 

-  wT(7r„_i  +  /3x„yn)  +  A||w||ii00 

where  /?  is  a  constant  between  0  and  1.  w„  and  w„_i  of 
problem  (11)  can  be  expressed  as 

wn_!  =  /( 0,7A), 

and 

wn  =  /( 1,  A). 

Our  proposed  method  computes  wn  from  wn_  |  in  the  follow¬ 
ing  two  steps: 

Step  1.  Fix  /3  =  0  and  calculate  /( 0,A)  from  /( 0, 7A).  This 
is  accomplished  by  computing  the  regularization  path  between 
7A  and  A  using  homotopy  methods  introduced  for  the  non¬ 
recursive  li jOC  group  lasso.  The  solution  path  is  piecewise 
linear  and  the  algorithm  is  described  in  [12]. 

Step  2.  Fix  A  and  calculate  the  solution  path  between  /( 0,  A) 
and  /( 1,  A).  This  is  the  key  problem  addressed  in  this  paper. 

To  ease  the  notations  we  denote  x„  and  yn  by  x  and  y, 
respectively,  and  define  the  following  variables: 

R(/3)  =  7Rn-l  +  /?xxT  (13) 

r(/3)  =  7rn-i  +  /3xy.  (14) 

Problem  (12)  is  then 

f(/3,  A)  =  argmin  iwTR(/?)w-wTr(^)  +  A||w||l  oo.  (15) 

W  Z 

In  Section  III  we  will  show  how  to  propagate  /( 0,  A)  to  /( 1,  A) 
using  the  homotopy  approach  applied  to  (15). 

III.  Online  homotopy  update 
A.  Set  notation 


ABC 


V  Q 


Fig.  2.  Illustration  of  the  partitioning  of  a  20  element  coefficient  vector  w 
into  5  groups  of  4  indices.  The  sets  V  and  Q  contain  the  active  groups  and 
the  inactive  groups,  respectively.  Within  each  of  the  two  active  groups  the 
maximal  coefficients  are  denoted  by  the  dark  red  color. 

B.  Optimality  condition 

The  objective  function  in  (15)  is  convex  but  non-smooth 
as  the  floo  norm  is  non-differentiable.  Therefore,  problem 
(15)  reaches  its  global  minimum  at  w  if  and  only  if  the  sub¬ 
differential  of  the  objective  function  contains  the  zero  vector. 
Let  <9||w||ij00  denote  the  sub-differential  of  the  fij00  norm  at 
w.  A  vector  z  £  <9||w||ij00  only  if  z  satisfies  the  following 
conditions  (details  can  be  found  in  [12],  [14]): 


llzujli  =  1  ,m£  V, 

(16) 

sgn  (zAm )  =  sgn  (w^m  ),m£V, 

(17) 

zg  =  0, 

(18) 

||zcm||i  <  1  ,m£  Q, 

(19) 

where  A,B,C,V  and  Q  are  /3-dependent  sets  defined  on  w 
as  defined  in  Section  III-A. 

For  notational  convenience  we  drop  /3  in  R(/3)  and  r(/3) 
leaving  the  /3-dependency  implicit.  The  optimality  condition 
is  then  written  as 


We  begin  by  introducing  a  series  of  set  definitions.  Figure 
2  provides  an  example.  We  divide  the  entire  group  index  set 
into  V  and  Q,  respectively,  where  V  contains  active  groups 
and  Q  is  its  complement.  For  each  active  group  to  £  V ,  we 
partition  the  group  into  two  parts:  the  maximal  values,  with 
indices  Am,  and  the  rest  of  the  values,  with  indices  Brn : 

Am  =  arg  max  \wi\,m  £  V , 

i&Sm 

and 

Bm  Qm  Am . 


The  set  A  and  B  are  defined  as  the  union  of  the  Arn  and  Brn 
sets,  respectively: 

A=  \J  Am,  B  =  U  Bm. 

mG'P  mG? 


Finally,  we  define 


and 


C=  (J  Qm 

m£  Q 


=  Gmnc. 


Rw  —  r  +  Az  =  0,  z  £  <9||w||li00.  (20) 

As  wc  =  0  and  zg  =  0,  (20)  implies  the  three  conditions 

+  R^gWg  -  +  Az_4  =  0,  (21) 

+  Rsswg  rg  =  0,  (22) 

R-caw.4  +  RCgwg  -  rc  +  Azc  =  0.  (23) 


The  vector  w_4  lies  in  a  low  dimensional  subspace.  Indeed, 
by  definition  of  Am.  if  \Am\  >  1 

H  =  K'|,  i,i' £  Am. 

Therefore,  for  any  active  group  m  £  V, 


where 


and 


3V_4;,(  ■~’_4  m  I*  /, 


1 1  I !  00  a 


(24) 


s.a  =  sgn  (w^) . 

Using  matrix  notation,  we  represent  (24)  as 


w_4  =  Sa. 


(25) 
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where 


is  a  \A\  x  \V\  sign  matrix  and  the  vector  a  is  comprised  of 

am,m  £  V. 

The  solution  to  (15)  can  be  determined  in  closed  form  if  the 
sign  matrix  S  and  sets  (A,B,C,V ,  Q)  are  available.  Indeed, 
from  (16)  and  (17) 

Stza  =  1,  (27) 


where  1  is  a  \V\  x  1  vector  comprised  of  l’s.  With  (25)  and 
(27),  (21)  and  (22)  are  equivalent  to 

STR^Sa  +  StHabwb  -  St1’.a  +  A1  =  0, 

(2oJ 

RfixlSa  +  Rsewg  —  rg  =  0. 


Therefore,  by  defining  the  (a.s.  invertible)  matrix 

_  (straas  StR_4B\ 

^  RBAS  Rbb  )' 


(29) 


and 


(30) 


(28)  is  equivalent  to  Hv  =  b  Ae,  where  e  =  (1T,  0T)T,  so 
that 


v  =  H-1(b  —  Ae).  (31) 


As  w_4  =  Sa,  the  solution  vector  w  can  be  directly  obtained 
from  v  via  (30).  For  the  sub-gradient  vector,  it  can  be  shown 
that 


Az_4  =  rA  -  (R^S  R^g)  v, 
zg  =  0 


(32) 

(33) 


and 

Az c  =  rg  —  (RcxiS  Rcb)  v.  (34) 


C.  Online  update 

Now  we  consider  (15)  using  the  results  in  III-B.  Let  0q  and 
0i  be  two  constants  such  that  0i  >  0 o-  For  a  given  value  of 
0  €  [0o,0i]  define  the  class  of  sets  S  =  (A,B,C,V,  Q)  and 
make  0  explicit  by  writing  S(0).  Recall  that  S(0)  is  specified 
by  the  solution  f(0,  A)  defined  in  (19).  Assume  that  S(0)  does 
not  change  for  /3  £  [/?0,  0i}-  The  following  theorem  propagates 
/(/?o,A)  to  ./(/3i ,  A)  via  a  simple  algebraic  relation. 


and 

Az c  =  Az c  +  2^°  (: y  -  y)  {xc  -  (RotS  Rgg)g}  , 

1  +  vHP  i 

(37) 

where  R  =  R(0)  as  defined  in  (13),  (x,  y)  is  the  new  sample 
as  defined  in  (13)  and  (14),  the  sign  matrix  S  is  obtained  from 
the  solution  at  (3  =  0o,  Hq  is  calculated  from  (29)  using  S 
and  R(0),  and  d,  u,  y  and  <72H  are  defined  by 


=(sy), 

(38) 

\  ) 

g  =  HqM. 

(39) 

y  =  dTv, 

(40) 

<J2h  =  d1  g. 

(41) 

The  proof  of  Theorem  1  is  provided  in  Appendix  A. 
Theorem  1  provides  the  closed  form  update  for  the  solution 
path  /(/ So,  A)  — ►  /(/?i,A),  under  the  assumption  that  the 
associated  sets  S(0)  remain  unaltered  over  the  path. 

Next,  we  partition  the  range  0  £  [0, 1]  into  contiguous 
segments  over  which  S(0)  is  piecewise  constant.  Within  each 
segment  we  can  use  Theorem  1  to  propagate  the  solution  from 
left  endpoint  to  right  endpoint.  Below  we  specify  an  algorithm 
for  finding  the  endpoints  of  each  of  these  segments. 

Fix  an  endpoint  0o  of  one  of  these  segments.  We  seek  a 
critical  point  0i  that  is  defined  as  the  maximum  0\  ensuring 
S(0)  remains  unchanged  within  [0o,0i\-  By  increasing  0\ 
from  0o,  the  sets  S(0)  will  not  change  until  at  least  one  of 
the  following  conditions  are  met: 

Condition  1.  There  exists  i  £  A  such  that  z\  =  0; 

Condition  2.  There  exists  i  £  Brn  such  that  u;'|  =  a'Tn ; 

Condition  3.  There  exists  m  £  V  such  that  o:'rn  =  0; 

Condition  4.  There  exists  m  £  Q  such  that  |z',  |  -|  =  1. 

Condition  1  is  from  (17)  and  (18),  Condition  2  and  3  are  based 
on  definitions  of  A  and  V,  respectively,  and  Condition  4  comes 
from  (16)  and  (19).  Following  [12],  [22],  the  four  conditions 
can  be  assumed  to  be  mutually  exclusive.  The  actions  with 
respect  to  Conditions  1-4  are  given  by 
Action  1.  Move  the  entry  i  from  A  to  B: 

A^  A~  {i},B  ^  BU{i}-, 

Action  2.  Move  the  entry  i  from  B  to  A: 

A^Au{i},B^B-{i}-, 


Theorem  1.  Let  0o  and  0\  be  two  constants  such  that  0i  >  0o 
and  for  any  0  £  [0o,0f\  the  solutions  to  (15)  share  the  same 
sets  S  =  (A,B,C,V,  Q).  Let  V  and  v  be  vectors  defined  as 
/(/3i,  A)  and  f(0o,X),  respectively.  Then 


and  the  corresponding  sub-gradient  vector  has  the  explicit 
update 

\z'A  =  \zA  +  — 2^—  (y  -  y )  {x-4  -  (R^S  R_4e)g} 

1  +  <XHP\ 

(36) 


Action  3.  Remove  group  m  from  the  active  group  list 
V  ■£-  V  -  {m},  Q  <-  Q  U  {m}, 
and  update  the  related  sets 

A  i —  A  —  Am  i  B  4 —  C  U  Am  j 

Action  4.  Select  group  m 

{m},  Q  -f-  Q  -  {m}, 
and  update  the  related  sets 

A  4 —  A  U  CmtB  i —  C  —  Cm  • 


V  =  v  + 


1  +  <72H01 


(y-y)  g, 


(35) 
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By  Theorem  1,  the  solution  update  from  /3q  to  3\  is  in 
closed  form.  The  critical  point  of  can  be  determined  in  a 
straightforward  manner  (details  are  provided  in  Appendix  B). 
Let  f3\  ,k  =  1 , . . . ,  4  be  the  minimum  value  that  is  greater 
than  /3o  and  meets  Condition  1-4,  respectively.  The  critical 
point  /?i  is  then 

Pi  =  min  (}\k\ 
fc=l,...,4 

D.  Homotopy  algorithm  implementation 

We  now  have  all  the  ingredients  for  the  homotopy  update 
algorithm  and  the  pseudo  code  is  given  in  Algorithm  1. 


Algorithm  1:  Homotopy  update  from  /( 0,  A)  to  /(1,A). 
Input  :  /(0,  A),R(0),x,y 
output:  /( 1,  A) 

Initialize  /?0  =  0,  Pi  =  0,  R  =  R(0); 

Calculate  (A,  B,  C,  V,  Q)  and  (v,Az^,Azc)  from 
/(0,  A); 

while  (30  <  1  do 

Calculate  the  environmental  variables 
(S,  H0,  d,  g,  y,  o2h)  from  /(/30,  A)  and  R; 

Calculate  {P[k^}f—i  that  meets  Condition  1-4, 
respectively; 

Calculate  the  critical  point  /3i  that  meets  Condition 
fc*:  /c*  =  argminfc/3^  and  fii  =  P[k^', 

if  [3 1  <  1  then 

Update  (v,  Az^,  Azc)  using  (35),  (36)  and  (37); 
Update  Q)  by  Action  &;*; 

Po  =  Pi\ 

else 

j  break; 
end 

end 

Pi  =  i; 

Update  (v,  Az_4,Azc)  using  (35); 

Calculate  /(1,A)  from  v. 


Next  we  analyze  the  computational  cost  of  Algorithm  1 .  The 
complexity  to  compute  each  critical  point  is  summarized  in 
Table  I,  where  N  is  the  dimension  of  H0.  As  N  =  l^l  +  ISI  < 
|Al|  +  \B\,  N  is  upper  bounded  by  the  number  of  non-zeros  in 
the  solution  vector.  The  vector  g  can  be  computed  in  0(N2) 
time  using  the  matrix-inverse  lemma  [20]  and  the  fact  that, 
for  each  action.  Ho  is  at  most  perturbed  by  a  rank-two  matrix. 
This  implies  that  the  computation  complexity  per  critical  point 
is  0(pma,x{N,  logp})  and  the  total  complexity  of  the  online 
update  is  0(k2  •  pmaxjiV,  logp}),  where  k2  is  the  number 
of  critical  points  of  /3  in  the  solution  path  /( 0,  A)  — >  /( 1,  A). 
This  is  the  computational  cost  required  for  Step  2  in  Section 
II-C. 

A  similar  analysis  can  be  performed  for  the  complexity  of 
Step  1,  which  requires  0(ki  ■  pmax{N,  logp})  where  k±  is 
the  number  of  critical  points  in  the  solution  path  /( 0,7A)  — >■ 
/( 0,  A).  Therefore,  the  overall  computation  complexity  of  the 
recursive  fijOC  group  lasso  is  0(k  ■  pmaxjiV,  logp}),  where 


S  =  Hn  id 

0(N2) 

~  (R-yt^S  R^s)g 

0(\A\N ) 

xc  -  (R-c.aS  Rce)g 

0(\C\N) 

w 5 

0(\A\) 

0(1-61 ) 

0(\V\) 

0(|C|log|C|) 

TABLE  I 

Computation  costs  of  online  homotopy  update  for  each 

CRITICAL  POINT. 


Fig.  3.  Responses  of  the  time  varying  system,  (a):  Initial  response,  (b): 
Response  after  the  200th  iteration.  The  groups  for  Algorithm  1  were  chosen 
as  20  equal  size  contiguous  groups  of  coefficients  partitioning  the  range 
1,...,100. 


k  =  k\  +  k2 ,  i.e.,  the  total  number  of  critical  points  in  the 
solution  path  f(0,y\)  — >■  /( 0,  A)  — >  /( 1,A). 

An  instructive  benchmark  is  to  directly  solve  the  n-samples 
problem  (12)  from  the  solution  path  /(l,oo)  (i.e.,  a  zero 
vector)  — >  /( 1,  A)  [12],  without  using  the  previous  solution 
w„_i.  This  algorithm,  called  iCap  in  [12],  requires  0(k'  ■ 
pmax{iV,  logp}),  where  k!  is  the  number  of  critical  points  in 
/( 1,  oo)  -A  /(l,  A).  Empirical  comparisons  between  k  and  k\ 
provided  in  the  following  section,  indicate  that  iCap  requires 
significantly  more  computation  than  our  proposed  Algorithm 
1. 

IV.  Numerical  simulations 

In  this  section  we  demonstrate  our  proposed  recursive  fij00 
group  lasso  algorithm  by  numerical  simulation.  We  simulated 
the  model  yj  =  w *Xj  +  vv  j  =  1, . . .  ,400,  where  v:/  is  a 
zero  mean  Gaussian  noise  and  w*  is  a  sparse  p  =  100  element 
vector  containing  only  14  non-zero  coefficients  clustered  be¬ 
tween  indices  29  and  42.  See  Fig.  3  (a).  After  200  time  units, 
the  locations  of  the  non-zero  coefficients  of  w*  is  shifted  to 
the  right,  as  indicated  in  Fig.  3  (b). 

The  input  vectors  were  generated  as  independent  identically 
distributed  Gaussian  random  vectors  with  zero  mean  and  iden¬ 
tity  covariance  matrix,  and  the  variance  of  observation  noise 
Vj  is  0.01.  We  created  the  groups  in  the  recursive  3i}00  group 
lasso  as  follows.  We  divide  the  100  RLS  filter  coefficients 
w  into  20  groups  with  group  boundaries  1,  5, 10, . . .,  where 
each  group  contains  5  coefficients.  The  forgetting  factor  7 
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Fig.  4.  Averaged  MSE  of  the  proposed  algorithm,  RLS  and  recursive  lasso.  Fig.  5.  Averaged  number  of  critical  points  for  the  proposed  recursive  method 
The  corresponding  system  response  is  shown  in  Fig.  3.  of  implementing  fi.oo  lasso  and  the  iCap  [12]  non-recursive  method  of 

implementation.  The  corresponding  system  response  is  shown  in  Fig.  3. 


and  the  regularization  parameter  A  were  set  to  0.9  and  0.1, 
respectively.  We  repeated  the  simulation  100  times  and  the 
averaged  mean  squared  errors  of  the  RLS,  sparse  RLS  and 
proposed  RLS  shown  in  Fig.  4.  We  implemented  the  standard 
RLS  and  sparse  RLS  using  the  l\  regularization,  where  the 
forgetting  factors  are  also  set  to  0.9.  As  [9],  [10]  and  [15] 
are  solving  the  same  online  optimization  problem  it  suffices 
to  compare  the  MSE  of  [15]  to  that  of  sparse  RLS.  The 
regularization  parameter  A  was  chosen  to  achieve  the  lowest 
steady-state  error,  resulting  in  A  =  0.05.  It  is  important  to  note 
that,  while  their  computational  complexity  is  lower  than  that 
of  our  proposed  RLS  algorithm  (and  lower  than  that  of  [15]), 
the  methods  of  [9]  and  [10]  only  approximate  the  exact  t\ 
penalized  solution. 

It  can  be  seen  from  Fig.  4  that  our  proposed  sparse  RLS 
method  outperforms  standard  RLS  and  sparse  RLS  in  both 
convergence  rate  and  steady-state  MSE.  This  demonstrates 
the  power  of  our  group  sparsity  penalty.  At  the  change  point 
of  200  iterations,  both  the  proposed  method  and  sparse  RLS 
of  [15]  show  superior  tracking  performances  as  compared  to 
the  standard  RLS.  We  also  observe  that  the  proposed  method 
achieves  even  smaller  MSE  after  the  change  point  occurs.  This 
is  due  to  the  fact  that  the  active  cluster  spans  across  group 
boundaries  in  the  initial  system  (Fig.  3  (a)),  while  the  active 
clusters  in  the  shifted  system  overlap  with  fewer  groups. 

Fig.  5  shows  the  average  number  of  critical  points  (ac¬ 
counting  for  both  trajectories  in  f3  and  A)  of  the  proposed 
algorithm,  i.e.,  the  number  k  as  defined  in  Section  III-D. 
As  a  comparison,  we  implement  the  iCap  method  of  [12], 
a  homotopy  based  algorithm  that  traces  the  solution  path  only 
over  A.  The  average  number  of  critical  points  for  iCap  is 
plotted  in  Fig.  5,  which  is  the  number  k'  in  Section  III-D.  Both 
the  proposed  algorithm  and  iCap  yield  the  same  solution  but 
have  different  computational  complexities  proportional  to  k 
and  k' ,  respectively.  It  can  be  seen  that  the  proposed  algorithm 
saves  as  much  as  75%  of  the  computation  costs  for  equivalent 
performance. 

We  next  investigated  the  effect  of  adding  a  second  group  of 
non-zero  coefficients  to  w*  and  compare  different  algorithms 
for  group-sparse  RLS  algorithms.  Fig.  6(a)-(b)  shows  these 
100  coefficients  before  and  after  a  relocation  of  both  the 


two  groups  in  reverse  directions  at  the  200th  time  point.  The 
parameter  settings  and  group  allocation  setting  for  our  RLS 
algorithm  are  the  same  as  before.  The  regularization  parameter 
A  is  set  to  0.03  for  the  proposed  method.  For  comparison, 
we  utilized  the  SpaRSA  algorithm  [19]  implemented  in  a 
public  software  package1  to  solve  group-sparse  regularized 
RLS  problems  in  an  online  manner.  The  SpaRSA  algorithm 
is  an  efficient  iterative  method  in  which  each  step  is  obtained 
by  solving  an  optimization  subproblem,  and  it  supports  warm- 
start  capabilities  by  choosing  good  initial  values.  In  our  online 
implementation,  the  current  estimate  is  used  as  the  initial  value 
for  solving  the  next  estimate  using  the  updated  measurement 
sample.  Both  l\t00  and  £12  regularizes  are  employed  in  the 
SpaRSA  algorithm.  The  regularization  parameter  A  of  £il00- 
SpaRSA  is  set  to  the  same  value  as  in  the  proposed  method 
and  the  A  for  2-SpaRSA  is  tuned  for  best  steady-state  MSE. 
As  SpaRSA  is  an  iterative  algorithm,  its  solution  is  dependent 
on  the  stopping  rule,  which  is  selected  based  on  stopping 
threshold  on  the  relative  change  in  the  objective  value.  Two 
stopping  threshold  values,  10~3  and  10-5,  are  set  for  £ij00- 
SpaRSA,  respectively,  and  10-5  is  set  to  £\ .^-SpaRSA, 

The  MSE  performance  comparison  is  shown  in  Fig.  7  and 
the  averaged  computational  time  performed  on  an  Intel  Core 
2.53GHz  CPU  is  plotted  in  Fig.  8.  It  can  be  observed  from  Fig. 
7  and  Fig.  8  that  both  the  MSE  performance  and  computational 
time  of  SpaRSA  algorithms  depend  on  the  stopping  threshold. 
This  is  due  to  the  fact  that  SpaRSA  obtains  approximate 
solutions  of  the  regularized  RLS  problem  rather  than  the 
exact  solution  attained  by  the  proposed  method.  For  £lj00- 
SpaRSA,  a  large  threshold  value  implies  faster  run  time  but 
less  tracking  performance  in  MSE  as  compared  to  a  smaller 
stopping  threshold.  We  observed  that  the  MSE  curve  of  the 
more  computationally  demanding  t-\  lOC-SpaRSA  will  approach 
to  that  of  the  proposed  method  when  the  stopping  threshold 
is  very  large  as  the  two  algorithms  solve  the  same  optimiza¬ 
tion  problem  (results  not  shown  here).  For  £|  ,2-SpaRSA,  the 
algorithm  achieves  comparable  computational  time  with  the 
proposed  method  with  the  specified  stopping  threshold,  while 

'Available  from  http://www.lx.it.pt/~mtf/SpaRSA/ 
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Fig.  6.  Coefficients  for  the  spurious  group  misalignment  experiment  shown 
in  Figs.  7  and  8.  (a):  Initial  response,  (b):  Response  after  the  200th  iteration. 
The  groups  for  the  recursive  group  sparse  RLS  algorithm  (Algorithm  1)  were 
chosen  as  20  equal  size  contiguous  groups  of  coefficients  partitioning  the 
range  1, . . . ,  100. 


Fig.  7.  Averaged  MSE  of  the  proposed  algorithm,  RLS  and  SpaRSA  al¬ 
gorithms  [19].  The  stopping  thresholds  for  ^i,oo -SpaRSA  (a),  ^i,oo -SpaRSA 
(b)  and  ^i^-SpaRSA  are  set  to  10— 3,  10-5  and  10— 5,  respectively.  The 
corresponding  system  response  is  shown  in  Fig.  6. 

the  MSE  performance  of  £i:2-SpaRSA  is  worse  than  that  of 
the  proposed  method. 

Note  that  the  system  response  before  the  200th  iteration  is 
perfectly  aligned  with  the  group  allocation  but  is  shifted  out 
of  alignment  after  the  change  point.  The  difference  between 
the  MSE  curves  of  the  proposed  method  before  and  after  the 
transition  suggests  that  our  proposed  recursive  group  lasso  is 
not  overly  sensitive  to  shifts  in  the  group-partition  even  when 
there  are  multiple  groups. 

V.  Conclusion 

In  this  paper  we  proposed  a  t\  jOC  regularized  RLS  algorithm 
for  online  sparse  linear  prediction.  We  developed  a  homotopy 
based  method  to  sequentially  update  the  solution  vector  as 
new  measurements  are  acquired.  Our  proposed  algorithm  uses 
the  previous  estimate  as  a  “warm-start”,  from  which  we 
compute  the  homotopy  update  to  the  exact  current  solution. 
The  proposed  algorithm  can  process  streaming  measurements 
with  time  varying  predictors  and  is  computationally  efficient 
compared  to  non-recursive  and  contemporary  warm-start  capa¬ 
ble  group  lasso  solvers.  Numerical  simulations  demonstrated 


Fig.  8.  Averaged  computational  time  of  the  proposed  algorithm,  RLS  and 
SpaRSA  algorithms  [19]  using  an  Intel  Core  2.53GHz  CPU.  The  stopping 
thresholds  for  ^1,00 -SpaRSA  (a),  ^1,00 -SpaRSA  (b)  and  ^1,2-SpaRSA  are  set 
to  10-3,  10-5  and  10 — 5,  respectively.  The  corresponding  system  response 
is  shown  in  Fig.  6. 

that  the  proposed  method  outperformed  the  standard  and  £\ 
regularized  RLS  for  identifying  an  unknown  group  sparse 
system,  in  terms  of  both  tracking  and  steady-state  mean 
squared  error. 

As  indicated  in  Section  IV,  the  MSE  performance  of  the 
proposed  method  depends  on  the  choice  of  group  parti¬ 
tion.  The  development  of  performance-optimal  data-dependent 
group  partitioning  algorithms  is  a  worthwhile  topic  for  future 
study.  Another  worthwhile  area  of  research  is  the  extension  of 
the  proposed  method  to  overlapping  groups  and  other  flexible 
partitions  [23]. 

VI.  Appendix 

The  authors  would  like  to  thank  the  reviewers  for  their 
valuable  comments  on  the  paper. 

VII.  Appendix 
A.  Proof  of  Theorem  1 

We  begin  by  deriving  (35).  According  to  (31), 

v'  =  H,_1(b'  —  Ae').  (42) 

As  S  and  (A,B,C,V,  Q)  remain  constant  within  [/3o,/3i], 

e'  =  e,  (43) 

b'  =  b  +  5dy ,  (44) 

and 

H'  =  H  +  SddT, 

where 

S  =  Pi  —  fio, 

H  and  b  are  calculated  using  S  within  [(3q,  /3i]  and  R(/3o)  and 
r(fio),  respectively.  We  emphasize  that  H  is  based  on  R(/3) 
and  is  different  from  Ho  defined  in  Theorem  1.  According  to 
the  Sherman-Morrison-Woodbury  formula, 

H,-i=H-1__L_(H-1d)(H-1d)T)  (45) 
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where  a2  =  d7  H  1d.  Substituting  (43),  (44)  and  (45)  into 
(42),  after  simplification  we  obtain 


v'  =  HT1  - 


(H_1d)(H_id)i  (b  +  Sdy  -  Ae) 


1  4-  cr25 

=  H~*(b  -  Ae)  +  H-^dy 

6  -  -  °2P 


1  +  cr2<5 

C 

=  v  +  - - —  (y  —  dTv)H_1d 


ddTH“1(b  -  Ae)  -  TT-^H^1dy 


1  +  a25 
5 

1  +  a25 


{y-y) h  :d, 


(46) 

where  y  ■■  d  7  v  as  defined  in  (40). 

Note  that  H  is  defined  in  terms  of  R(/3q)  rather  than  R(0) 
and 

H  =  H0+/30ddT, 


so  that 


TT  I  _  TT-1  _  Pa  T 

44  R-o  i  ,  2  o  > 

1  +  CT  Hp0 


(47) 


H~1d  =  Ho1d-  g. 

0  l  +  a2Hp0 ^ 


Accordingly, 


r2=dTU~1d  =  a2~  ~2  -  "H 


'HW  x  ~  UH 

Substituting  (48)  and  (49)  to  (46),  we  finally  obtain 

v'  -  'r  — S-2 -yriv  ~y) g  =  v  +  2°a  (y  -  y) s- 


1  +  crjjPi 


1  +  CfftPl 


P  = 


1  +  a2HPi ' 


1)  Critical  point  for  Condition  1:  Define  the  temporary 
vector 

tu  =  (y  -  y)  W  -  (R^S  R^)g}. 

According  to  (36), 

Az^  =  Az_4  +  pt_4. 

Condition  1  is  met  for  any  p  =  p^p  such  that 

p<I)  =  _xa,.e_4 

f'i 

Therefore,  the  critical  value  of  p  that  satisfies  Condition  1  is 

P(1)  =  min  jpf }  i  G  A,  pf]  G  (0,  1/<j2h)  )  . 

2)  Critical  point  for  Condition  2:  By  the  definition  (30),  v 
is  a  concatenation  of  am  and  wg  ,  m  G  V: 


vT  = 


where  g  and  a2H  are  defined  by  (39)  and  (41),  respectively. 
As  a%  =  dTg, 


(WmeP.w^.-.w^J,  (52) 


(48) 


_  _(TMo_  2  =  ah  (m 
H  l  +  a2p0  H  l+a2p0' 


Equations  (36)  and  (37)  can  be  established  by  direct  sub¬ 
stitutions  of  (35)  into  their  definitions  (32)  and  (34)  and  thus 
the  proof  of  Theorem  1  is  complete. 

B.  Computation  of  critical  points 

For  ease  of  notation  we  work  with  p,  defined  by 

Pi  ~  Po 


(50) 


It  is  easy  to  see  that  over  the  range  pi  >  /30,  p  is  monotonically 
increasing  in  (0,  \/a2H).  Therefore,  (50)  can  be  inverted  by 

ft  =  (51) 

1-°hP 

where  p  G  (0,  l/cr#)  to  ensure  Pi  >  P0. 

Suppose  we  have  obtained  p(k\k  =  1,...,4,  p[k ^  can  be 
calculated  using  (51)  and  the  critical  point  pi  is  then 

Pi  =  min  p\k\ 

fc=l,...,4 

We  now  calculate  the  critical  value  of  p  for  each  condition 
one  by  one. 


where  (am)TOg-p  denotes  the  vector  comprised  of  am,  m  G  V. 
Now  we  partition  the  vector  g  in  the  same  manner  as  (52)  and 
denote  rm  and  um  as  the  counter  part  of  am  and  wgm  in  g, 
i.e., 

ST  =  (WmEPi  uf , ...,  ufo)  • 

Eq.  (35)  is  then  equivalent  to 

tXjji  +  pTm,  (53) 

and 

4"  ptlm,ii 

where  urni  is  the  «-th  element  of  the  vector  uTO.  Condition  2 
indicates  that 

a'm  = 

and  is  satisfied  if  p  =  p^+P  or  p  =  p^~\  where 

(2+)  _  WBr^i  (2—)  _  H-  ^Bm,i 

Pm,i  —  —  .  _  >  Pm,i  ~  ~~  .  ,  • 

U'm,z  •  m  i  >m 

Therefore,  the  critical  value  of  p  for  Condition  2  is 

P(2)  =min{Pm?  meV,i=  1,  •••,  \Bm\,  Pm^  e  (M/Off)} 

3)  Critical  point  for  Condition  3:  According  to  (53),  a'  = 

(3) 

0  yields  p  =  p\  determined  by 


(3)  <^ra 

Pm  = - G  V, 

Tm 

and  the  critical  value  for  p^>  is 


pO  =  mm  • 


{p{m  ,mGP,  p ^  G  (0, 1/<72H)  )  . 
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Fig.  9.  An  illustration  of  the  fast  algorithm  for  critical  condition  4. 


4)  Critical  point  for  Condition  4:  Define 

tc  =  (y  ~  V)  {xc  ^  (RcaS  Rcb)  g}  • 
Eq.  (37)  is  then 


_  ^ZCm  +  P^Cm  1 


and  Condition  4  is  equivalent  to 

^2  \pti  +  Xzi\  =  X.  (54) 

*ecm 


To  solve  (54)  we  develop  a  fast  method  that  requires  complex¬ 
ity  of  0(N  log  N),  where  N  =  \Crn\.  The  algorithm  is  given 
in  Appendix  C.  For  each  m  £  Q,  let  pfn  be  the  minimum 
positive  solution  to  (54).  The  critical  value  of  p  for  Condition 
4  is  then 


m  6  Q,pW  €  (0,  l/crlr)  }  - 


Algorithm  2:  Solve  x  from  Y^Li  a,i\x  —  Xi\  =  y. 

Input  :  {ai,Xi}?=1,  y 
Output.  Xm[n,  XmSLX 

Sort  {xi}^=1  in  the  ascending  order:  X\  <  x2  <  ...  <  x^'. 
Re-order  {ai}^Lx  such  that  a i  corresponds  to  xp, 

Set  k0  =  ~  J2iLi  au 
for  i  =  1, ...,  N  do 
j  ki  —  ki—i  2 o,,, 

end 

Calculate  hi  =  YxU aAxi  —  xf; 
for  i  =  2, ...,  N  do 
j  hi  —  hi  — i  T  ki—l(Xi  Xi—  i) 

end 

if  min.j  k,  >  y  then 
No  solution; 

Exit; 

else 

if  y  >  hi  then 

|  •t'min  —  Xl  T  (y  hi'j/ko, 

else 

Seek  j  such  that  y  £  [hj,  hj- 1]; 

•t'min  =  Xj  +  (y  -  hj)/kj- 1; 

end 

if  y  >  hjy  then 

|  zmax  =  xN  +  {y  -  hN)/kN; 

else 

Seek  j  such  that  y  £  [hj- 1,  hj]; 

•t-max  —  Xj— 1  +  (y  hj  —  1 )  /  kj  —  ] , 

end 

end 


C.  Fast  algorithm  for  critical  condition  4 

Here  we  develop  an  algorithm  to  solve  problem  (54). 
Consider  solving  the  more  general  problem: 

N 

ai\x  -  Xj\  =  y,  (55) 

»= i 

where  a,  and  Xi  are  constants  and  ar  >  0.  Please  note  that 
the  notations  here  have  no  connections  to  those  in  previous 
sections.  Define  the  following  function 

N 

h(x)  =  ^ ~2ai\x  -  Xi\. 

i=  1 

The  problem  is  then  equivalent  to  finding  h~1{y),  if  it  exists. 

An  illustration  of  the  function  h(x)  is  shown  in  Fig.  9, 
where  ki  denotes  the  slope  of  the  ith  segment.  It  can  be  shown 
that  h(x)  is  piecewise  linear  and  convex  in  x.  Therefore,  the 
equation  (55)  generally  has  two  solutions  if  they  exist,  denoted 
as  .x'mj,,  and  xmax.  Based  on  piecewise  linearity  we  propose  a 
search  algorithm  to  solve  (55).  The  pseudo  code  is  shown  in 
Algorithm  2  and  its  computation  complexity  is  dominated  by 
the  sorting  operation  which  requires  O(NlogN). 
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